lab_1_yourinitials (e.g. mine will be lab_1_ah)In the setup chunk in your RMarkdown document, attach the following packages:
tidyverseheresftmap*Note: you may need to install these packages if you don’t already have them (recall: install.packages("packagename"))
The data you’ll use (to start) is within the data/sf_trees subfolder. Use the here package to read in the sf_trees.csv file.
sf_trees <- read_csv(here("data","sf_trees","sf_trees.csv"))
About the data: SF trees data are from the SF Open Data Portal. See more information from Thomas Mock and TidyTuesday here.
Check out the data using exploratory functions (e.g. View(), names(), summary(), etc.). Remember that those probably do not belong in your .Rmd code chunks (if you don’t need a record, you can either comment it out or put it in the Console).
dplyr reviewExample 1: Find counts of observations by legal_status & wrangle a bit:
# Way 1: group_by %>% summarize %>% n
sf_trees %>%
group_by(legal_status) %>%
summarize(tree_count = n())
## # A tibble: 10 x 2
## legal_status tree_count
## <chr> <int>
## 1 DPW Maintained 141725
## 2 Landmark tree 42
## 3 Permitted Site 39732
## 4 Planning Code 138.1 required 971
## 5 Private 163
## 6 Property Tree 316
## 7 Section 143 230
## 8 Significant Tree 1648
## 9 Undocumented 8106
## 10 <NA> 54
# Way 2: Same thing (+ a few other dplyr functions)
top_10_status <- sf_trees %>%
count(legal_status) %>%
drop_na(legal_status) %>%
rename(tree_count = n) %>%
relocate(tree_count) %>%
top_n(tree_count, 10) %>%
arrange(-tree_count)
Make a graph of top 10 from above:
ggplot(data = top_10_status, aes(x = fct_reorder(legal_status, tree_count), y = tree_count)) +
geom_col() +
labs(y = "Tree count", x = "Legal Status") +
coord_flip() +
theme_minimal()
Example 2: Only keep observations where legal status is Permitted Site and caretaker is MTA. Store as permitted_mta.
permitted_mta <- sf_trees %>%
filter(legal_status == "Permitted Site", caretaker == "MTA")
Example 3: Only keep Blackwood Acacia trees, then only keep columns legal_status, date, latitude and longitude. Store as blackwood_acacia.
The stringr package contains a bunch of useful functions for finding & working with strings (e.g. words). One is str_detect() to detect a specific string within in a column.
blackwood_acacia <- sf_trees %>%
filter(str_detect(species, "Blackwood Acacia")) %>%
select(legal_status, date, latitude, longitude)
# Make a little graph of locations (note R doesn't know these are spatial)
ggplot(data = blackwood_acacia, aes(x = longitude, y = latitude)) +
geom_point()
Example 4: Meet
tidyr::separate()
Separate the species column into two separate columns: spp_scientific and spp_common
sf_trees_sep <- sf_trees %>%
separate(species, into = c("spp_scientific", "spp_common"), sep = " :: ")
Example 5: Meet tidyr::unite()
Yeah, it does the opposite. Unite the tree_id and legal_status columns, using a separator of “COOL” (no, you’d never actually do this…).
ex_5 <- sf_trees %>%
unite("id_status", tree_id:legal_status, sep = "_COOL_")
You need sf and tmap successfully attached to do this part. We’ll convert lat/lon to spatial data (see that now there’s a column called geometry), then we can use geom_sf() to plot.
Step 1: Convert the lat/lon to spatial points
Use st_as_sf() to convert to spatial coordinates:
blackwood_acacia_sp <- blackwood_acacia %>%
drop_na(longitude, latitude) %>%
st_as_sf(coords = c("longitude","latitude")) # Convert to spatial coordinates
# But we need to set the coordinate reference system (CRS) so it's compatible with the street map of San Francisco we'll use as a "base layer":
st_crs(blackwood_acacia_sp) = 4326
# Then we can use `geom_sf`!
ggplot(data = blackwood_acacia_sp) +
geom_sf(color = "darkgreen") +
theme_minimal()
But that’s not especially useful unless we have an actual map of SF to plot this on, right?
Read in the SF shapefile (data/sf_map/tl_2017_06075_roads.shp):
sf_map <- read_sf(here("data","sf_map","tl_2017_06075_roads.shp"))
st_transform(sf_map, 4326)
## Simple feature collection with 4087 features and 4 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: -122.5136 ymin: 37.70813 xmax: -122.3496 ymax: 37.83213
## geographic CRS: WGS 84
## # A tibble: 4,087 x 5
## LINEARID FULLNAME RTTYP MTFCC geometry
## * <chr> <chr> <chr> <chr> <LINESTRING [°]>
## 1 110498938… Hwy 101 S O… M S1400 (-122.4041 37.74842, -122.404 37.7483, -…
## 2 110498937… Hwy 101 N o… M S1400 (-122.4744 37.80691, -122.4746 37.80684,…
## 3 110366022… Ludlow Aly … M S1780 (-122.4596 37.73853, -122.4596 37.73845,…
## 4 110608181… Mission Bay… M S1400 (-122.3946 37.77082, -122.3929 37.77092,…
## 5 110366689… 25th Ave N M S1400 (-122.4858 37.78953, -122.4855 37.78935,…
## 6 110368970… Willard N M S1400 (-122.457 37.77817, -122.457 37.77812, -…
## 7 110368970… 25th Ave N M S1400 (-122.4858 37.78953, -122.4858 37.78952,…
## 8 110498933… Avenue N M S1400 (-122.3643 37.81947, -122.3638 37.82064,…
## 9 110368970… 25th Ave N M S1400 (-122.4854 37.78983, -122.4858 37.78953)
## 10 110367749… Mission Bay… M S1400 (-122.3865 37.77086, -122.3878 37.77076,…
## # … with 4,077 more rows
ggplot(data = sf_map) +
geom_sf()
Now combine them:
ggplot() +
geom_sf(data = sf_map,
size = 0.1,
color = "darkgray") +
geom_sf(data = blackwood_acacia_sp,
color = "red",
size = 0.5) +
theme_void() +
labs(title = "Blackwood acacias in San Francisco")
tmap_mode("view")
tm_shape(blackwood_acacia_sp) +
tm_dots()